goal condition
MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning
Mishani, Itamar, Shaoul, Yorai, Likhachev, Maxim
Planning long-horizon manipulation motions using a set of predefined skills is a central challenge in robotics; solving it efficiently could enable general-purpose robots to tackle novel tasks by flexibly composing generic skills. Solutions to this problem lie in an infinitely vast space of parameterized skill sequences -- a space where common incremental methods struggle to find sequences that have non-obvious intermediate steps. Some approaches reason over lower-dimensional, symbolic spaces, which are more tractable to explore but may be brittle and are laborious to construct. In this work, we introduce MOSAIC, a skill-centric, multi-directional planning approach that targets these challenges by reasoning about which skills to employ and where they are most likely to succeed, by utilizing physics simulation to estimate skill execution outcomes. Specifically, MOSAIC employs two complementary skill families: Generators, which identify ``islands of competence'' where skills are demonstrably effective, and Connectors, which link these skill-trajectories by solving boundary value problems. By focusing planning efforts on regions of high competence, MOSAIC efficiently discovers physically-grounded solutions. We demonstrate its efficacy on complex long-horizon problems in both simulation and the real world, using a diverse set of skills including generative diffusion models, motion planning algorithms, and manipulation-specific models. Visit skill-mosaic.github.io for demonstrations and examples.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.88)
ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning
Choi, Jae-Woo, Kim, Hyungmin, Ong, Hyobin, Jang, Minsu, Kim, Dohyung, Kim, Jaehong, Yoon, Youngwoo
Recent advancements in large language models (LLMs) have enabled significant progress in decision-making and task planning for embodied autonomous agents. However, most existing methods still struggle with complex, long-horizon tasks because they rely on a monolithic trajectory that entangles all past decisions and observations, attempting to solve the entire task in a single unified process. To address this limitation, we propose ReAcTree, a hierarchical task-planning method that decomposes a complex goal into more manageable subgoals within a dynamically constructed agent tree. Each subgoal is handled by an LLM agent node capable of reasoning, acting, and further expanding the tree, while control flow nodes coordinate the execution strategies of agent nodes. In addition, we integrate two complementary memory systems: each agent node retrieves goal-specific, subgoal-level examples from episodic memory and shares environment-specific observations through working memory. Experiments on the WAH-NL and ALFRED datasets demonstrate that ReAcTree consistently outperforms strong task-planning baselines such as ReAct across diverse LLMs. Notably, on WAH-NL, ReAcTree achieves a 61% goal success rate with Qwen 2.5 72B, nearly doubling ReAct's 31%.
- Research Report (0.64)
- Workflow (0.46)
- Consumer Products & Services (0.71)
- Health & Medicine (0.70)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing
Haworth, Jesse, Chen, Juo-Tung, Nelson, Nigel, Kim, Ji Woong, Moghani, Masoud, Finn, Chelsea, Krieger, Axel
Robotic suturing is a prototypical long-horizon dexterous manipulation task, requiring coordinated needle grasping, precise tissue penetration, and secure knot tying. Despite numerous efforts toward end-to-end autonomy, a fully autonomous suturing pipeline has yet to be demonstrated on physical hardware. We introduce SutureBot: an autonomous suturing benchmark on the da Vinci Research Kit (dVRK), spanning needle pickup, tissue insertion, and knot tying. To ensure repeatability, we release a high-fidelity dataset comprising 1,890 suturing demonstrations. Furthermore, we propose a goal-conditioned framework that explicitly optimizes insertion-point precision, improving targeting accuracy by 59\%-74\% over a task-only baseline. To establish this task as a benchmark for dexterous imitation learning, we evaluate state-of-the-art vision-language-action (VLA) models, including $π_0$, GR00T N1, OpenVLA-OFT, and multitask ACT, each augmented with a high-level task-prediction policy. Autonomous suturing is a key milestone toward achieving robotic autonomy in surgery. These contributions support reproducible evaluation and development of precision-focused, long-horizon dexterous manipulation policies necessary for end-to-end suturing. Dataset is available at: https://huggingface.co/datasets/jchen396/suturebot
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Health Care Technology (0.93)
- Health & Medicine > Surgery (0.67)
AT ask Details
Table 5: All task variations except shape used in VLMbench. Table 6: All object models used in VLMbench. Object type Number of classes Classes Basic model 3 cube (1), triangular prism (1), cylinder (1)Special model 9 star (1), moon (1), cross (1), flower (1), letter't' (1), pencil (1), basket (1), box container(1), shape sorter (1)Planar model 6 rectangle (1), circle (1), triangle (1), star (1), cross (1), flower (1)Functional model 2 mug (6), sponge (1) Articulated model 2 door with one rotatble handle (2), cabinet with three vertical drawers (3) In the VLMbench, we show eight task categories:"Pick & Place objects", "Stack objects", "Drop When building an instance-level task with one variation, the other variations will also randomly change. For example, in the demonstrations of "Pick & Place objects" In the dataset, we have five types of objects, shown in Table 6. Visualizations can be found on the project website. The object can be placed anywhere with any orientation inside the container. When the detector is triggered, the task considers a success. Instruction T emplates: High-level instructions: "Pick up [target object description] and place it into [target container description]."; Low-level instructions: ("Move to the top of [target object "Move the object into [target container description]; V ariations and scene settings: All objects are randomly changing colors, size, and positions in each demonstration. Color: There are two same-shape objects and two same-shape containers in the scene initialization. All colors are randomly sampled from the color library. The object description is "[color] object"; The container description is "[color] container." Size: There are two same-shape objects and two same-shape containers in the scene initialization. One object and one container are randomly magnified while others are randomly shrunk. Relative Position: There are two same-shape objects and two same-shape containers in the scene initialization. The object description is "[front/rear/left/right] object"; The container description The number of objects varies from two to the length of the object library. High-level instructions: "Stack [below object description] and [above object Low-level instructions: ("Move to the top of [above object description]; "Move the object on [below object description]; Release the Object models: In the seen settings, five object models: star, triangular, cylinder, cube, moon.
Grounding Language Models with Semantic Digital Twins for Robotic Planning
Naeem, Mehreen, Melnik, Andrew, Beetz, Michael
We introduce a novel framework that integrates Semantic Digital Twins (SDTs) with Large Language Models (LLMs) to enable adaptive and goal-driven robotic task execution in dynamic environments. The system decomposes natural language instructions into structured action triplets, which are grounded in contextual environmental data provided by the SDT. This semantic grounding allows the robot to interpret object affordances and interaction rules, enabling action planning and real-time adaptability. In case of execution failures, the LLM utilizes error feedback and SDT insights to generate recovery strategies and iteratively revise the action plan. We evaluate our approach using tasks from the ALFRED benchmark, demonstrating robust performance across various household scenarios. The proposed framework effectively combines high-level reasoning with semantic environment understanding, achieving reliable task completion in the face of uncertainty and failure.
- Research Report (0.68)
- Workflow (0.46)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.63)
MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning
Mohammadi, Mohammad, Honerkamp, Daniel, Büchner, Martin, Cassinelli, Matteo, Welschehold, Tim, Despinoy, Fabien, Gilitschenski, Igor, Valada, Abhinav
Autonomous long-horizon mobile manipulation encompasses a multitude of challenges, including scene dynamics, unexplored areas, and error recovery. Recent works have leveraged foundation models for scene-level robotic reasoning and planning. However, the performance of these methods degrades when dealing with a large number of objects and large-scale environments. To address these limitations, we propose MORE, a novel approach for enhancing the capabilities of language models to solve zero-shot mobile manipulation planning for rearrangement tasks. MORE leverages scene graphs to represent environments, incorporates instance differentiation, and introduces an active filtering scheme that extracts task-relevant subgraphs of object and region instances. These steps yield a bounded planning problem, effectively mitigating hallucinations and improving reliability. Additionally, we introduce several enhancements that enable planning across both indoor and outdoor environments. We evaluate MORE on 81 diverse rearrangement tasks from the BEHAVIOR-1K benchmark, where it becomes the first approach to successfully solve a significant share of the benchmark, outperforming recent foundation model-based approaches. Furthermore, we demonstrate the capabilities of our approach in several complex real-world tasks, mimicking everyday activities. We make the code publicly available at https://more-model.cs.uni-freiburg.de.
- Europe > Germany > Baden-Württemberg > Freiburg (0.24)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Netherlands > South Holland > Delft (0.04)
PEA: Enhancing LLM Performance on Computational-Reasoning Tasks
Wang, Zi, Weng, Shiwei, Alhanahnah, Mohannad, Jha, Somesh, Reps, Tom
Large Language Models (LLMs) have exhibited significant generalization capabilities across diverse domains, prompting investigations into their potential as generic reasoning engines. Recent studies have explored inference-time computation techniques [Welleck et al., 2024, Snell et al., 2024], particularly prompt engineering methods such as Chain-of-Thought (CoT), to enhance LLM performance on complex reasoning tasks [Wei et al., 2022]. These approaches have successfully improved model performance and expanded LLMs' practical applications. However, despite the growing focus on enhancing model capabilities through inference-time computation for complex reasoning tasks, the current literature lacks a formal framework to precisely describe and characterize the complexity of reasoning problems. This study identifies a class of reasoning problems, termed computational reasoning problems, which are particularly challenging for LLMs [Yao et al., 2023, Hao et al., 2024, Valmeekam et al., 2023], such as planning problems and arithmetic games. Informally, these problems can be accurately described using succinct programmatic representations. We propose a formal framework to describe and algorithmically solve these problems. The framework employs first-order logic, equipped with efficiently computable predicates and finite domains.
- North America > United States > Virginia (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting
Zhang, Chaoyuan, Li, Zhaowei, Yuan, Wentao
Integrating large language models (LLMs) into closed-loop robotic task planning has become increasingly popular within embodied artificial intelligence. Previous efforts mainly focused on leveraging the strong reasoning abilities of LLMs to enhance task planning performance while often overlooking task planning efficiency and executability due to repetitive queries to LLMs. This paper addresses the synergy between LLMs and task planning systems, aiming to minimize redundancy while enhancing planning effectiveness. Specifically, building upon Prog-Prompt and the high-level concept of Tree-Planner, we propose Vote-Tree-Planner. This sampling strategy utilizes votes to guide plan traversal during the decision-making process. Our approach is motivated by a straightforward observation: assigning weights to agents during decision-making enables the evaluation of critical paths before execution. With this simple vote-tree construction, our method further improves the success rate and reduces the number of queries to LLMs. The experimental results highlight that our Vote-Tree-Planner demonstrates greater stability and shows a higher average success rate and goal condition recall on the unseen dataset compared with previous baseline methods. These findings underscore the potential of the Vote-Tree-Planner to enhance planning accuracy, reliability, and efficiency in LLM-based planning systems.
Goal-Conditioned Data Augmentation for Offline Reinforcement Learning
Huang, Xingshuai, Member, Di Wu, Boulet, Benoit
Offline reinforcement learning (RL) enables policy learning from pre-collected offline datasets, relaxing the need to interact directly with the environment. However, limited by the quality of offline datasets, it generally fails to learn well-qualified policies in suboptimal datasets. To address datasets with insufficient optimal demonstrations, we introduce Goal-cOnditioned Data Augmentation (GODA), a novel goal-conditioned diffusion-based method for augmenting samples with higher quality. Leveraging recent advancements in generative modeling, GODA incorporates a novel return-oriented goal condition with various selection mechanisms. Specifically, we introduce a controllable scaling technique to provide enhanced return-based guidance during data sampling. GODA learns a comprehensive distribution representation of the original offline datasets while generating new data with selectively higher-return goals, thereby maximizing the utility of limited optimal demonstrations. Furthermore, we propose a novel adaptive gated conditioning method for processing noised inputs and conditions, enhancing the capture of goal-oriented guidance. We conduct experiments on the D4RL benchmark and real-world challenges, specifically traffic signal control (TSC) tasks, to demonstrate GODA's effectiveness in enhancing data quality and superior performance compared to state-of-the-art data augmentation methods across various offline RL algorithms.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)